The number of cells being analyzed in single-cell projects continues to push the envelope. Last week, the Chan Zuckerberg Initiative (CZI) launched the Billion Cells Project—an effort to generate an unprecedented one billion cell dataset to fuel rapid progress in AI model development in biology. The CZI will collaborate with 10x Genomics, Ultima Genomics, and researchers for the project.
“CZI’s Billion Cells Project illustrates the power of collaboration to make previously unfathomable amounts of single-cell data available for researchers, which will help clarify our understanding of the fundamental biology underpinning human health and disease while supercharging efforts at the intersection of AI and biology,” said Jonah Cool, PhD, cell science senior science program officer at CZI. “Biology not only needs more data—the field needs more data faster and in interoperable formats to support AI models that address specific problems, and this project represents a unique approach to scaling and standardizing scientific outputs for AI and more.”
Once completed, the single-cell data set will bring new data and resolution to areas of biology such as mapping genetic perturbations across diverse cell types and tissues. The scale and cohesion of the data contributed to the Billion Cells Project will deliver the data with greater consistency than past efforts. This will enable researchers to train AI models and make discoveries across precision medicine and functional genomics.
Data generated from the Billion Cells Project will be used to train new virtual cell models using CZI’s computing system to derive greater insight into the vast dataset. As part of its commitment to open science to accelerate research and make science more inclusive, CZI plans to make the results from this initiative open source and freely available to help scientists around the world make new discoveries about human biology.
CZI is partnering with an initial cohort of experts in the single-cell field and developers of innovative life science technologies. This partnership prioritizes data generation across diverse biological domains, with initial data sets including organisms such as mouse, zebrafish, and primary human cell models. These data will provide researchers immediate insight into areas that include gene regulation and function across organisms, and insights directly related to disease research.
“This project will provide a necessary scale of data to understand the functional effects of human genetic variants and characterize the genetic drivers of human disease,” said Alexander Marson, MD, PhD, director of the Gladstone-UCSF Institute of Genomic Immunology and collaborator. “Ultimately, the Billion Cells Project will also be a functional roadmap to guide drug development, identifying targets to restore diseased cells to health.
This collaboration will use 10x Genomics’ Chromium GEM-X technology for single-cell analysis. Sequencing will be performed on the UG 100, an ultra-high throughput next-generation sequencing (NGS) platform developed by Ultima Genomics. The UG 100’s unique wafer-based sequencing architecture enables high throughput sequencing, making the platform well-suited for large-scale omics applications and the generation of genomic data at a large scale.
“Our mission is to continuously increase the scale and lower the costs of genomic information, and our UG 100 platform helps fulfill that vision by enabling high throughput generation of large-scale data omics sets at low costs,” said Gilad Almogy, PhD, founder and CEO of Ultima Genomics. “Initiatives like the Billion Cells Project showcase advances enabled by our innovative sequencing architecture combined with the emerging advances in AI and machine learning will enable a transformational approach to research that will accelerate scientific discovery and our understanding of biology’s complexity.”